Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams

نویسندگان

  • Ali Ziaei
  • Abhijeet Sangwan
  • John H. L. Hansen
چکیده

In this study, we develop a new system for real world audio environment matching. Environment detection within unknown audio streams requires a system that operates in an unsupervised manner since it will be faced with unknown environments without prior information. In addition, the overall solution should be computationally efficient for large audio collection. In the proposed approach, a Gaussian mixture model(GMM) is trained on large amounts of unlabeled audio data and used as a background acoustic model. Subsequently, an acoustic signature vector (ASV) is computed for each environment. Here, the ASV vector is designed to capture the unique acoustic characteristics of an environment. Using the ASV vectors, we demonstrate that it is possible to compute an effective similarity measure between two acoustic environments. We demonstrate the performance of the proposed system on real-world audio data, and compare it to a traditional GMM-UBM (Universal Background Model) system. Experiments show that our system achieves an equal error rate (EER) that is +35% better than a baseline GMM-UBM system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A speech system for estimating daily word counts

The ability to count the number of words spoken by an individual over long durations is important to researchers investigating language development, healthcare, education, etc. In this study, we attempt to build a speech system that can compute daily word counts using data from the Prof-Life-Log corpus. The task is challenging as typical audio files from Prof-LifeLog tend to be 8-to-16 hours lo...

متن کامل

Event Recognition for Meaningful Human-computer Interaction in a Smart Environment

The aim of this project is to monitor a room for the purposes of analysing the interactions and identities of a small set of individuals. We work with multiple uncalibrated sensors that observe a single environment and generate multimodal data streams. These streams are processed with the help of a generic client-server middleware called SmartFlow. Modules for visual motion detection, visual fa...

متن کامل

Privacy Protection for Life-log Video

Recent advances in wearable cameras and storage devices allow us to record the holistic human experience for an extended period of time. Such a life-log system can capture audio-visual data anywhere and at any time. It has a wide range of applications from law enforcement, journalism, medicine to personal archival. On the other hand, there is a natural apprehension towards such an intrusive sys...

متن کامل

Medusa - A Distributed Sound Environment

This paper introduces Medusa, a distributed sound environment that allows several machines connected in a local area network to share multiple streams of audio and MIDI, and to replace hardware mixers and also specialized multi-channel audio cables by network communication. Medusa has no centralized servers: any computer in the local environment may act as a server of audio/MIDI streams, and as...

متن کامل

RTSI: An Index Structure for Multi-Modal Real-Time Search on Live Audio Streaming Services

Audio streaming services (e.g., Mixlr, Ximalaya, Lizhi and Facebook Live Audio) have become increasingly popular due to the wide use of smart phones. More and more people are enjoying live audio broadcasting while they are doing various kinds of activities. On the other hand, the data volume of live audio streams is also ever increasing. Searching and indexing these audio streams is still an im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012